Search CORE

9 research outputs found

Cross-Word Arabic Pronunciation Variation Modeling Using Part of Speech Tagging

Author: AbuZeina Dia
Al-Muhtaseb Husni
Elshafei Moustafa
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

Cross-Word Modeling for Arabic Speech Recognition

Author: AbuZeina Dia
Elshafei Moustafa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

"Cross-Word Modeling for Arabic Speech Recognition" utilizes phonological rules in order to model the cross-word problem, a merging of adjacent words in speech caused by continuous speech, to enhance the performance of continuous speech recognition systems. The author aims to provide an understanding of the cross-word problem and how it can be avoided, specifically focusing on Arabic phonology using an HHM-based classifier.N/

CERN Document Server

Exploring the Performance of Tagging for the Classical and the Modern Standard Arabic

Author: Dia AbuZeina
Taqieddin Mostafa Abdalbaset
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

The part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications. In fact, the PoS taggers contribute as a preprocessing step in various NLP tasks, such as syntactic parsing, information extraction, machine translation, and speech synthesis. In this paper, we examine the performance of a modern standard Arabic (MSA) based tagger for the classical (i.e., traditional or historical) Arabic. In this work, we employed the Stanford Arabic model tagger to evaluate the imperative verbs in the Holy Quran. In fact, the Stanford tagger contains 29 tags; however, this work experimentally evaluates just one that is the VB ≡ imperative verb. The testing set contains 741 imperative verbs, which appear in 1,848 positions in the Holy Quran. Despite the previously reported accuracy of the Arabic model of the Stanford tagger, which is 96.26% for all tags and 80.14% for unknown words, the experimental results show that this accuracy is only 7.28% for the imperative verbs. This result promotes the need for further research to expose why the tagging is severely inaccurate for classical Arabic. The performance decline might be an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks

Directory of Open Access Journals

Exploring the Performance of Tagging for the Classical and the Modern Standard Arabic

Author: Dia AbuZeina
Taqieddin Mostafa Abdalbaset
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

Arabic Part of Speech Tagging by Using the Stanford System: Prepositions as a Case Study

Author: AbuZeina Dia Eddin
Al-Tamimi Taqieddin
Publication venue: Arab Journals Platform
Publication date: 29/04/2021
Field of study

This paper discusses part of speech (PoS) tagging for Arabic prepositions. Arabic has a number of predefined sets of particles such as particles of Nasb, particles of Jazm, particles of Jarr(also called prepositions), etc.Each set has a particular role in the context in which it appears. In general, PoS is the process of assigning a tag for each word (e.g. name, verb, particle, etc.) based on the context. In fact, PoS is a beneficial tool for many natural language processing (NLP) toolkits. For instance, itis used in syntactic parsing to validate the grammar of the sentence in question. It is also beneficial to understand the required meaning via textual analysis for further processing in search engines. Many other language processing applications utilize PoS such as machine translation, speech synthesis, speech recognition, diacritization, etc. Hence, the performance quality of many NLP applications depends on the accuracy of outputs of the used tagging system. Hence, thisstudy examines the Stanford tagger to explore its tag set in the text under examination and its performance for tagging Arabic prepositions. This study also discusses the weaknesses of the Stanford tagger, as it does not handle the merging case when a preposition joins with an adjacent word to form one single word. Another concern of the Stanford tagger is that it gives a unique tag for different particles such as Jarr and Jazmin terms of linguistic functions. Through our inductive studyof prepositions in terms of linguistic functions such as Jazm and Istifham (interrogation), we did not note differences in tagging prepositions like “to” ((إلىand “in” ((في. Other prepositions are also difficult to distinguish unless they are contextualized; these include“until” ((حتىand “except” ((عدا. This shows that this tagging system is inaccurate and the need for keeping up with tagging-related systems is vital, hence is the significance of our research. In this work, we used the Holy Quran to identifythe performance of the Stanford System in tagging prepositions in the Quran. This work encourages more research on tagging other Arabic prepositions to explore the compatibility of tagging symbols employed in the Stanford System and prepositions used in the Arabic language, in general

Arab Journals Platform

Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic

Author: Dia AbuZeina
Fawaz S. Al-Anzi
Publication venue: 'Elsevier BV'
Publication date: 01/04/2017
Field of study

Cosine similarity is one of the most popular distance measures in text classification problems. In this paper, we used this important measure to investigate the performance of Arabic language text classification. For textual features, vector space model (VSM) is generally used as a model to represent textual information as numerical vectors. However, Latent Semantic Indexing (LSI) is a better textual representation technique as it maintains semantic information between the words. Hence, we used the singular value decomposition (SVD) method to extract textual features based on LSI. In our experiments, we conducted comparison between some of the well-known classification methods such as Naïve Bayes, k-Nearest Neighbors, Neural Network, Random Forest, Support Vector Machine, and classification tree. We used a corpus that contains 4,000 documents of ten topics (400 document for each topic). The corpus contains 2,127,197 words with about 139,168 unique words. The testing set contains 400 documents, 40 documents for each topics. As a weighing scheme, we used Term Frequency.Inverse Document Frequency (TF.IDF). This study reveals that the classification methods that use LSI features significantly outperform the TF.IDF-based methods. It also reveals that k-Nearest Neighbors (based on cosine measure) and support vector machine are the best performing classifiers

Directory of Open Access Journals

The impact of phonological rules on Arabic speech recognition

Author: A Ramsay
D AbuZeina
D AbuZeina
D AbuZeina
D Jurafsky
Dia AbuZeina
Fawaz S. Al-Anzi
J Akesson
JM Kessens
L Kyong-Nim
M A-AM Abushariah
M Alghamdi
M Ali
M Benzeghiba
M Wester
MA Elshafei
N Seman
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Employing fisher discriminant analysis for Arabic text classification

Author: Al-Anzi
Al-Anzi
Al-Anzi
Al-Anzi
Al-Anzi
Al-Badarneh
Alghamdi
Belkin
Deerwester
Dia AbuZeina
Duda
Fawaz S. Al-Anzi
Froud
Hadni
Harrag
Jolliffe
Kantardzic
Li
Liu
Lu
Marsland
Martinez
Martínez
Park
Rencher
Theodoridis
Torkkola
Wang
Zheng
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach

Author: Ahmed
Al-Anzi
Al-Anzi
Alabbas
Baomao
Bi
Cai
Chakrabarti
D'Alessio
Dhillon
Dia AbuZeina
Dowman
Erkan
Fawaz S. Al-Anzi
Gao
Godbole
Goyal
Guerra-Gómez
Haji
He
Joshi
Kantardzic
Leon-Garcia
Li
Meng
Osiek
Rabiner
Rodrigues
Ruiz
Salton
Sampathkumar
Shen
Silla
Uysal
Von Hilgers
Wong
Ying
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref